unit ball
their feedback in the final version of the paper. 3 Below, we address the questions of Reviewer # 3 regarding the definition of smoothness
We thank all the reviewers for their thoughtful feedback. Below, we address the questions of Reviewer # 3 regarding the definition of smoothness. Next we address (2) and compare our definition to Speilman and Teng's definition of smoothness. Let us start by highlighting the similarities between our model of smoothness and Speilman and Teng's. On the other hand, Speilman and Teng's model does this
Statistical and Topological Properties of Sliced Probability Divergences S
We can now prove Theorem 1. Proof of Theorem 1. Now, let us prove the other implication, i.e. Theorem 2. Our result is thus consistent with the existing results in the literature. Next, we show that this result holds for two popular choices of kernels. We conclude that k ห k is positive definite, hence (S17) holds for RBF kernels.S1.4 Proof of Theorem 3 Proof of Theorem 3. We start by upper bounding the distance between two regularized measures. The desired result is obtained as a direct application of Theorems 2 and 3.S1.6
Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers
Unser, Michael, Ducotterd, Stanislas
This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be expressed as a power of a seminorm. We then show that such functionals can be approximated to arbitrary precision with the help of polyhedral norms. In particular, we identify two dual parameterizations of such systems: (i) a synthesis form with an $\ell_1$-penalty that involves some learnable dictionary; and (ii) an analysis form with an $\ell_\infty$-penalty that involves a trainable regularization operator. After having provided geometric insights and proved that the two forms are universal, we propose an implementation that relies on a specific architecture (tight frame with a weighted $\ell_1$ penalty) that is easy to train. We illustrate its use for denoising and the reconstruction of biomedical images. We find that the proposed framework outperforms the sparsity-based methods of compressed sensing, while it offers essentially the same convergence and robustness guarantees.
On robust recovery of signals from indirect observations
Bekri, Yannis, Juditsky, Anatoli, Nemirovski, Arkadi
We consider an uncertain linear inverse problem as follows. Given observation $\omega=Ax_*+\zeta$ where $A\in {\bf R}^{m\times p}$ and $\zeta\in {\bf R}^{m}$ is observation noise, we want to recover unknown signal $x_*$, known to belong to a convex set ${\cal X}\subset{\bf R}^{n}$. As opposed to the "standard" setting of such problem, we suppose that the model noise $\zeta$ is "corrupted" -- contains an uncertain (deterministic dense or singular) component. Specifically, we assume that $\zeta$ decomposes into $\zeta=N\nu_*+\xi$ where $\xi$ is the random noise and $N\nu_*$ is the "adversarial contamination" with known $\cal N\subset {\bf R}^n$ such that $\nu_*\in \cal N$ and $N\in {\bf R}^{m\times n}$. We consider two "uncertainty setups" in which $\cal N$ is either a convex bounded set or is the set of sparse vectors (with at most $s$ nonvanishing entries). We analyse the performance of "uncertainty-immunized" polyhedral estimates -- a particular class of nonlinear estimates as introduced in [15, 16] -- and show how "presumably good" estimates of the sort may be constructed in the situation where the signal set is an ellitope (essentially, a symmetric convex set delimited by quadratic surfaces) by means of efficient convex optimization routines.
Residual Random Neural Networks
The single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature. The advantage of these networks is their simplified training, which reduces to solving a ridge-regression problem. A general assumption is that these networks require a large number of hidden neurons relative to the dimensionality of the data samples, in order to achieve good classification accuracy. Contrary to this assumption, here we show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples, if this dimensionality is reasonably high. Inspired by this result, we also develop an efficient iterative residual training method for such random neural networks, and we extend the algorithm to the least-squares kernel version of the neural network model. Moreover, we also describe an encryption (obfuscation) method which can be used to protect both the data and the resulted network model.
k-Support and Ordered Weighted Sparsity for Overlapping Groups: Hardness and Algorithms
We study the norms obtained from extending the k-support norm and OWL norms to the setting in which there are overlapping groups. The resulting norms are in general NP-hard to compute, but they are tractable for certain collections of groups. To demonstrate this fact, we develop a dynamic program for the problem of projecting onto the set of vectors supported by a fixed number of groups.
Lecture notes on high-dimensional data
The text below arose from a course on'Mathematical Data Science' that I taught twice for final year BSc Mathematics students in the UK between 2019 and 2020. The notes presently cover the first part (roughly a third) of the course focussing on the characteristics and peculiarities of high-dimensional data. An improved version of the notes appeared as part of the textbook [7]; we refer the reader in particular to [7, Chapters 8 -12]. I would like to thank my former students who attended the course and helped me with their feedback to write these lecture notes. Concrete examples are as follows. Each user can give a rating from one to five stars for each movie. When doing medical diagnostic tests, we can represent a subject by the vector containing her/his results. These can include integers like antibody counts, real numbers like temperature, pairs of real numbers like blood pressure, or binary values like if a subject has tested positive or negative for a certain infection. If we name the users 1, 2, 3,..., we can represent user j in R Given such a high-dimensional data set A, classical tasks to analyze the data, or make predictions based on it, involve to compute distances between data points. This can be for example the classical euclidean distance (or any other p-norm), CHAPTER 1. THE CURSE OF HIGH DIMENSIONS 4 However, if d is very large, we are faced with the following two obstructions.